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^ ■ Abstract 
Ch . 

| A typical approach in estimating the learning rate of a regularized learning scheme is to bound 

the approximation error by the sum of the sampling error, the hypothesis error and the regulariza- 
[ — ' tion error. Using a reproducing kernel space that satisfies the linear representer theorem brings the 

C^l . advantage of discarding the hypothesis error from the sum automatically. Following this direction, 

we illustrate how reproducing kernel Banach spaces with the i x norm can be applied to improve 
the learning rate estimate of ^-regularization in machine learning. 

Keywords: reproducing kernel Banach spaces, sparse learning, regularization, least square regres- 
sion, learning rate, the representer theorem 

C/3 



CN ■ 1 Introduction 

> . 

A class of reproducing kernel Banach spaces (RKBS) with the i 1 norm that satisfies the linear rep- 
resenter theorem was recently constructed in |l4|]. The purpose of this note is to illustrate how the 
obtained spaces can be applied to estimate the learning rate of the ^-regularized least square regression 
in machine learning. 

A general coefficient-based regularization of the least square regression has the form 



in 



ij^|tf x (x i )c-y/ + A0(c), (1.1) 

Til < 

where x := {xj : j £ N m } with N m := {1, 2, . . . , m} is the sequence of sampling points from an input 
space X, yj € Y C M is the observed data on Xj, A is a positive regularization parameter, is a 
nonnegative regularization function on the coefficient column vector c, and with a chosen function 
K : X x X — > R, K*(x) is the lxm row vector (K(xj,x) : j € N m ). 
When K is a positive-definite reproducing kernel on X and 

0(c) := c T K[x]c, (1.2) 
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where K[x\ is the m x m matrix defined by 

( K i*])j,k ■= K(x k , Xj), j, k G N r 



it follows from the celebrated representer theorem |q] that (|1.1| ) is the classical regularization network 
and has been extensively studied in the literature || ||, [H], |l3|Jl9|. Estimates for the learning rate 
of the regularization network can be found, for example, in[||Jg, [l^, 23]. Learning rates for 
(1.1) when <p(c) = Xlj=i \ c j\ P f° r 1 < P < 2 and p = 2 were respectively obtained in |l8| and |l6f| . 
The linear programming regularization where 4>(c) is the i 1 norm ||c||i of c has recently attracted 
much attention. The increasing interest is mainly brought by the progress of the lasso in statistics 
|o ] and compressive sensing (2), || in which ^-regularization is able to yield sparse representation 
of the resulting minimizer, a desirable feature in model selection. Moreover, the ^-regularization is 
particularly robust to non-Gaussian additive noise such as impulsive noise |l], ||. 

Without making use of a reproducing kernel space, the recent references pT| , pC| ] established esti- 
mates of the learning rate for the ^-regularized least square regression 

^ m 

min — \K*(xj)c - yA 2 + Mch. (1.3) 

i=i 

We attempt to show that improvement on the estimates could be made if an RKBS with the i 1 norm 
is used. To explain how this could be done, we first introduce the popular approach || for learning 
rate estimates in machine learning. 

A fundamental assumption in machine learning is that the sample data z := {(xj,yj) : j € N m } £ 
XxY consists of independent and identically distributed instances of a random variable (x,y) £ X xY 
subject to an unknown probability measure p on X x Y . The performance of a predictor / : X — > Y 
is hence measured by 

£{f) := [ \f{x)-y\ 2 dp. 
JXxY 

The predictor that minimizes the above error is the regression function 

f p (x) := J ydp(y\x), x G X, (1.4) 

where p(y\x) denotes the conditional probability measure of y with respect to x. In fact, we have for 
every predictor / that 

£(f) =£(f p ) + ||/ -f P \\h px , (1-5) 

where px is the marginal probability measure of p on X and for p £ [1, +oo), L p Px denotes the Banach 
space of measurable functions / on X with respect to px such that 



L PX 



1/p 

\f(x)\Pdp x (x) ) < +oo. 



X 



The formula (|1.4| ), though attractive, is only of theoretical value as p is unknown. A practical way is 
to find a minimizer c Z) \ of ( |1.1[ ) and hope that 

/ ZjA (x) := K x (x)c ZjA , i£l (1.6) 

will be competitive with f p in the sense that the approximation error £(f Z) \) — £(fp) would be small. 
To be more precise, for the learning scheme (|l[l|) to be useful in practice, this error should converge 
to zero fast in probability as the number of sampling points increases. 
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The approach in |Q works by introducing intermediate functions between f x> \ and f p that are from 
a Banach space B of functions on X with the properties that K(x,-) G B for all x G X and for all 
pairwise distinct xj G X, j G N m and c G R m 

^(||K x (-)c|| B ) = 0(c), 

for some nonnegative function ?/>. Here || • \\jg is the norm on B. Let g be an arbitrary function from 
such a space B and set for each function / : X — > R 



1 m 



The approximation error £(f Zi \) — £(f p ) can then be decomposed into the sum of four quantities 

£(f z ,x) - £(f P ) = S(z,X,g) +V(z,X,g) +V(X,g) - A^(||/.,a||b), (1.7) 



S(z,X,g) 
V(z,X,g) 
T>(X,g) 



m\gh)) 



where 

£{h,x)-£,{h,x) + £,{g)-£{g\ 
(£.(/.,a) + a^(||/., a || b )) -(£.(</) 

£{g)-£(f p ) + XiP(\\g\\ B ). 

The above three quantities are called the sampling error, the hypothesis error and the regularization 
error, respectively. The strategy is to choose B and g carefully so that these three errors can be well 
bounded from above. When B is the reproducing kernel Hilbert space of a positive-definite reproducing 
kernel K on X and the regularizer <fi is given by (|1.2| ), we have ip(t) = t 2 , t G R and by the representer 
theorem and the definition of f z \ in 



TM that 



/ Z)A = argrnm(^(/) + AH/HI) 



In this case, one immediately has that V(z, X,g) < and thus, by (1.7) that 



£(f z ,x)<S(z,X,g)+V(X,g). 



(1.9) 



For the ^-regularization where (f)(c) = ||c||i, the space B chosen in [11, does not satisfy the linear 



representer theorem. Consequently, the hypothesis error needed to be dealt with there. 

A class of RKBS with the £ l norm that satisfies the linear representer theorem was recently 



constructed in [14]. In Section 2, we shall follow a similar idea to construct a slightly larger RKBS 
with the same desirable properties. By using the constructed space, we enjoy the same advantage 
as that for the RKHS case of discarding the hypothesis error automatically. Moreover, the space 
also leads to a better estimate of the regularization error than that in [j20||. Combining these two 



improvements and directly using the estimates of the sampling error established in [gQ] or [11|, one 
immediately has a superior learning rate. As our focus is on the advantages brought by the constructed 
RKBS, we shall only improve the learning rate estimate of p(| in Section 3. Interested readers may 
follow our strategy to engage the more sophisticated sampling error estimate given in [11] to improve 
the learning rate therein. 



2 RKBS by Borel Measures 

In this section, we construct RKBS applicable to the error analysis of the ^-regularized least square 
regression. The constructed spaces are expected to have the i 1 norm and satisfy the linear representer 
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theorem. The approach is different from the one by semi- inner products in [21, 22] as an infinite- 
dimensional £ l space is neither reflexive nor strictly convex. 

Suppose that the input space X is a locally convex topological space and denote by Cq(X) the 
space of continuous functions / : X — > M. such that for all e > 0, the set {x G X : > e} is 

compact. We also impose the requirement that for all pairwise distinct Xj G X, j G N m , m G N, the 
kernel matrix K[x\ is nonsingular. With the maximum norm ||/||c (X) := max xex \f( x )\> the space 
Cq(X) is a Banach space. Its dual space is isometrically isomorphic to the space Ai(X) of all the 
signed Borel measures on X with bounded total variation. In other words, for each continuous linear 
functional T on Co(X), there exists a unique measure /i G Ai(X) such that 

T(f) = I f(x)dfi(x) and sup Jf 71 = [1/4 (2-1) 
Jx feC (x)j^o \\J\\c (X) 

where denotes the total variation of /i. 

Let K be a real-valued function on X x X such that K(-, x) G Co(X) for all x G X and 

span {lf(-,s) : x G X} = C (X). (2.2) 

With such a function, we introduce the following space 

B :={/„:= J K(t r )dn(t): fx e M(X)\ (2.3) 

with the norm 

II/mIIb := IIHI- (2-4) 



Recall that a vector space V is called a pre-RKBS [14] on X if it is a Banach space consisting of 
functions on X such that point evaluation functionals are continuous on V and such that for all 
/ G V, \\f\\v = if and only if / vanishes everywhere on X. 



Proposition 2.1 Suppose that K(-,x) G Cq(X) for all x G X and (2.k) is satisfied. Then B defined 



by (j2.3i) is a pre-RKBS on X. 



Proof: We first show that the norm (|2.4| ) is well-defined. Let /j,, v be two measures in Ai(X) such that 
fii(x) = fu{x) for all x G X. Then we get that 



/ K(t, x)d((i - v){t) = for all x G X. 
Jx 



By the denseness condition (|2.2| ), the above equation implies that fi — v = 0. Thus, the measure \i 
associated with a function G B is unique. This proves that ( ^j ) is well-defined and that ||/^||b = 
if and only if = for all x G X. Another consequence is that B is isometrically isomorphic to 

M(X) and is hence a Banach space. Finally, we observe for all xq G X and \i G Ai(X) that 



K(t,x )d(i(t) 



x 



< \\K(;X )\\ Co ( X) \\n\\ = \\K(;X )\\ Co (x)\\U\\b- 



Therefore, point evaluations are continuous linear functionals on B. We conclude that B is a pre-RKBS 
on X. The proof is complete. □ 
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Let the sampling points in x be pairwise distinct. By definition, K x (-)c G B for all c G M' m . The 



denseness condition (2J2) implies that K(xj, ■), j G N m are linearly independent. As a result, 

||^ x (-)c|| B = ||c||i. (2.5) 

It is in the above sense that B is said to possess the i 1 norm. 

We next turn to the crucial linear representer theorem in B. We say that B satisfies the lin- 
ear representer theorem if for all continuous nonnegative loss function Q and regularizer ip with 
lim^oo i()(t) = +00, the regularized learning scheme 

inf Q(/(x)) + AV(||/|| S ) 
has a minimizer /o of the form /o = K x (-)c for some c G M m . Here, /(x) = (f(xj) : j G N m ) T . 



The following lemma can be proved by arguments similar to those in [14] 



Lemma 2.2 The space B satisfies the linear representer theorem if and only if for all x of pairwise 
distinct sampling points and y G M. m , the minimal norm interpolation 

inf{||/|| B :/G£, /(x) = y} (2.6) 

has a minimizer /q of the form /o = K*(-)c for some c G M. m . 



A subspace of B was constructed in [14] and conditions for it to satisfy the linear representer 
theorem were studied. In order to make use of the results obtained there, we first introduce the 
subspace. Denote by ^(X) the subset of Ai(X) of those Borel measures that are supported on a 
countable subset of X. Thus, for each v G l l {X), there exist some pairwise distinct points xj G X, 
j G I where I is a countable index set, such that 

f{A) = ^2 u ( x j) f° r every Borel subset iCI, 
Denote by suppi^ the countable set of points where v is nonzero. The space B\ considered in | | is 

B 1 :=\ V u(x)K(x,-):ue£ 1 (X)\ 




with the norm inherited from that of B. 

Put for all x G X, K x (x) := (K(x,Xj) : j G N m ) T , which is an m x 1 vector in W 171 . One should 
not confuse K x (x) with K*{x). The latter is 1 x m and might even not be the transpose of the former 



as K is not required to be symmetric. The following result about B\ is from [14j. 
Lemma 2.3 For all y G R m , the minimal norm interpolation 

inf{||/|| Bl :/€Bi, /(x) = y} (2.7) 
has a minimizer /o of the form fo = K*(-)c for some c G W 71 if and only if 

Ili^Ix]" 1 ^^)]]! < 1 for all xeX. (2.8) 



Moreover, under condition ( \2.q) , there holds for all c G W 71 that 

\c (X) 



c T ^x(-)llc m = l|c T ^[x]|| oo , (2.£ 



where 1 1 • lino is the maximum norm on 
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We are ready to present the main result of this section. 
Theorem 2.4 The space B satisfies the linear representer theorem if and only if ftB.Sj) holds true. 



Proof: Suppose that ( [2.8] ) holds true. By Lemma 2.2, to show that B satisfies the linear representer 
theorem, it suffices to show that /o = K x (-)K[x\~ 1 y is a minimizer of ( |2.6| ). Clearly, /o(x) = y. Let 
7^, \i G A4(X), be an arbitrary function in B that satisfies the interpolation condition /^(x) = y. We 
then have for all c G R m that 

/ c T K x (t)dfi{t) = / y^CjK{t : Xj)d^{t) = y^cjf^Xj) = c T y. 



It follows from (gj) that for all c G R m 

|c T y| < \\c T K x (-)\\ Co{x) M. 
This together with ( |2.S| ) implies that 

n l cT 2/| \ cT y\ l^Khd^^-yl i 

IMI - su p I. tts t mi = su p ii T^r in = su p — n — n = \\ K \*\ f Hi- 

ceK m ,c^0 ll c A x(,V||c (X) cgR"\c^0 ll c -"-[ x J||oo aSK m ,a^0 ll a l|oo 

Now, recall by (2.5) that ||/o||b = ||-^[ x ] _1 I/||i and by definition of || • ||# that H/^Hb = ||//||. These two 
facts combined with the above inequality imply that H/^Hb > || fo \\b- Thus, /o is indeed a minimizer 
of 0. 

On the other hand, suppose that B satisfies the linear representer theorem and we want to prove 
( |2.8[ ). Let y G W 71 . By Lemma [2.2| , the minimal norm interpolation (|2.6| ) has a minimizer /q of the 



form /o = if x (-)c for some c G R m . Clearly, /o is also a minimizer of (2.7) because /o G B\ and 

ll/olls! = ll/olle = inf{||/|| B : / G B, /(x) = y} < M{\\f\\ Bl : / G Si, /(x) = y}. 
By Lemma Ell, fl2.8|) holds true. The proof is complete. □ 



It will become clear in the next section that the above theorem makes B a useful space for error 
analysis of the ^-regularized least square regression. 

We present two examples of K that satisfy all the assumptions, especially fl2.8|) , in this section: 

— the exponential kernel 

K{s,t) := e~ ls ~ tl , s,t€R, 

- the Brownian bridge kernel 

K(s,t) := min{s,t} - st, s,t € (0, 1). 



That these two kernels satisfy ( |2.8|) has been proved in 1 14 ] . It remains to verify the denseness 
requirement (|2.2|). The exponential kernel is a particular case of the following result. 



Proposition 2.5 If <f> is Lebesgue integrable on W d that is nonzero almost everywhere then the function 

K(s,t):= f e -i (* -t > € 0(€)d€, s,teR d (2.10) 

satisfies that K(-,t) G Co(M. d ) for all t G R d and the denseness condition (2.i). So does K(s,t) := 
tp(s — t), s,t G M. d where tp is a nontrivial continuous function on M. d of compact support. 
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Proof: That the function given by ( |2.10| ) belongs to Co(M d ) for all t G W d follows from the Riemann- 
Lebesgue lemma. The denseness condition (|2.2|) for the two kernels can be proved by arguments 



similar to those in [14]. □ 



The Brownian bridge kernel is handled with a manner different from that in pll. 



Proposition 2.6 The Brownian bridge kernel satisfies (2.i). 



Proof: Clearly, for the Brownian bridge kernel, K(-,t) is continuous for all t G (0, 1). Let v be a Borel 
measure on X := (0, 1) such that 

/ K(s,t)du(s) = for all t G (0, 1). (2.11) 
Jx 

Note that K has the representation 

K(s,t)= [ T s (z)T t (z)dz, s,te (0,1), 
Jx 

where T s := X(o,s) ~ s with X(o,s) denoting the characteristic function of (0,s). Arguments similar to 
those in [14 1 yield that there exists a constant C such that 

I du{s) = C for all s G (0,1). 
J o 

It follows that v((s\,S2)) = for all < s± < S2 < 1. Consequently, v is the zero Borel measure on 
(0, 1). Thus, the Brownian bridge kernel satisfies (|2.2j). □ 



Finally, we remark that the function K can be regarded as the reproducing kernel for B constructed 



by (2.3). To see this, we introduce a bilinear form on B x Cq(X) by setting 

(fn,9) ■= / 9(x)dfi(x) for all /i G M{X) and g G C (X). 
Jx 



We observe by (^lj) that 

\(U,g)\ < ll^llll5'llc , o(x) = Il//xbll5'lloo(x) 

and that for all x G X, 

U(x) = (U,K(-,x)), g(x) = (K(x,-),g). 
In the above senses, K is said to be the reproducing kernel for both B and Cq(X). 

3 Error Analysis of the ^-Regularization 

We apply the constructed space B to estimate the learning rate of the ^-regularized least square 



regression (L2) in this section. To this end, we first introduce some standard assumptions in the 
literature imposed on the regression function f p , the input space X and the function K. 

Let X be compact metric space with the distance d and assume that px is a Borel probability 
measure on X. In this note, we suppose that K is a positive-definite reproducing kernel on X with 
the Lipschitz condition 

\K(x,t) — K(x,t )| < C a (d(t,t')) a for some positive constants a,C a and for all x,t,t' G X. (3.1) 
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Denote for all r > by M(X, r) the least number of open balls with radius r that cover X. Assume 
that this covering number satisfies for some positive constants rj, C„ that 

C 

M{X, r) < for all < r < 1. (3.2) 

The requirement on f p is that it is contained in the range ran (L S K ) of L S K for some s > 0. Here, Lk 
is the compact positive operator on l? px defined by 



L K f:= [ K(t,-)f(t)dp x (t), feL 2 px . 
Jx 



Let 4>j , j £ N be an orthonormal basis for L 2 px consisting of eigenfunctions of Lk with the correspond- 
ing eigenvalues Xj > Xj+i, j E N. The assumption f p E ran (L S K ) implies that 

oo 

i=i 

for some h = ^j^=i a j < t > j m ^ px' ^ n or der to make use of the space constructed in the last section, our 
last requirement is that K satisfies that span {K(-,x) : x E X} = C(X) and condition d2.8|) . 

Let c Z]/ \ be a minimizer of ( |1.3|) and let / Z) a be given by ( |1.6| ). For the minimization problem ( |1.3| ), 
the hypothesis error and regularization error have the specific forms 

P(z, A, 5 ) := (£.(/.,*) + A||/, |A || B ) - (£ z (g) + X\\g\\ B ) , 
V(X,g) :=£(g)-£(f p ) + X\\g\\ B , 

where g is a function in B to be carefully chosen. 

The use of the space B enables us to discard the hypothesis error immediately. 

Lemma 3.1 Under the above assumptions on K, there holds £(f Zt \) — £(fp) < 5(z,A,g) + *D(X,g) 
for all g E B. 



Proof: By Theorem 2.4, 



A,A = argminf z (/) + X\\f\\ B . 



As a consequence, V(z,X,g) < 0, which together with inequality (1.7) completes the proof. □ 
We next estimate the regularization error. 

Lemma 3.2 If < s < 1 then 



2b 



inf V{\g) < {\\h\\L^+\\h\\h >~ (3-3) 



If s > 1 then f p £ B and 

^(A,/ p )<(Ar 1 ||/ l || L?x )A. (3.4) 



Proof: Firstly, we have for each (p £ L 2 px that Lx^p £ B and by the Cauchy-Schwartz inequality that 



\\Lkv\\b = IMIli < IMIz,g • (3-5) 
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If s > 1 then f p = Lx'-P where 



As Xj is non-increasing, 



\ i / 2 

2 



We then get by the above equation and (|3.5|) that 



2?(A,/ P ) = A||/ P || B < A|M| L 2 x < AA s r x 



^7, 



which is (3,4). 

i . . 

Suppose now that < s < 1. If Ai < X 1 + s then by (|l.5|) , 

oo 

V(\,0) = £(0) -£(f p ) = \\f p f L2 =J2^ 2 j<^\\h\\h , 

PX z — ' J J PX 

which implies (|3.3| ). If Ai > A~ then since Aj decreases to zero as j tends to infinity, there exists 

i 

some JVeN such that A^+i < A^ 8 < Aat. Put 

N 

ip := ^Aj _1 o,-^. 
j'=i 

It follows from (jlj) and ((T|) that 

£>(A, L^v?) < ||L K - f P \\h + A|| V 1 1 z,2 



PX 



PX 



We estimate that 



A|klk ?x =A(E4AH <AA^(E4) 
\ 7=1 / \ J=1 / 



/ N N 1/2 

< Ai+ 



1% 



and that 



II^-/pII!? x = E Afa|<A^ £ a^<A^|H|^. 



j=N+l 



Combing the above two inequalities leads to (3.3). The proof is complete. 



□ 



We remark that the estimated regularization error in |2(| was of the order 0(\i+°) for < s < 2. 
Turning to the sampling error, we follow the approach in to decompose it into the sum 
«S(z, A, g) = <Si(z, A, g) + S2 (z, A) where 

Si(z, A, g) = (£ z (g) - £ z (f P )) ~ (£(9) ~ £(/,)), S 2 (z, A) = (£(/ BjA ) - £(/ p )) - (£ Z (/ Z , A ) - £,(/,)). 

The first summand S\(z,X,g) can be bounded by using the law of large numbers. By the same 
arguments as those in || ^Q] , we use the estimate in Lemma 3J2 to obtain an improved bound. 
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Lemma 3.3 Suppose that the output of sample data is bounded by a positive constant almost surely. 
If < s < 1 then for each e > there exists some g G B such that for all < 5 < 1, we have with 
confidence 1 



f that 



Si(z,A,<7) < Cx 

for some positive constant C\. If s > 1 then 5i(z, A, f„) 
For 52 (z, A), we cite the following result from 




0. 



Lemma 3.4 Suppose that (3.1) and (3.i) hold true. If A < 1 then we have with confidence 1— | i/iai 



5 2 (z, A) <i(f(/., A ) -£(/„)) + C 2 



log | + log(l + m) 
A2 



77J l + 



/or some positive constant Ci- 



Combining Lemmas |3.l| , 3.2 , 3.3 , and 3^, we reach a new learning rate estimate of the ^-regularized 
least square regression. 



Theorem 3.5 Suppose that X satisfy l\3.Bj , the output is bounded by a positive constant almost 
surely, and f p £ ran (L S K ) for some s > 0. Let K be a positive- definite reproducing kernel satisfying 
span {K(-,x) : x G X} = C(X), the condition (2.t) and the Lipschitz condition (3.1). Then there 

1 1 1 + 3 

exists some constant C > such that with the choice A = m 2 1+2s , we have for all < 5 < 1 
with confidence 1 — 5 that 



£(/»,a) - S(J P ) < Cm log 



and 



S(f,,x)-S(f p )<Cm W^Tloj 



2 + 2m 
1 + m 



w/ten < s < 1 



w/ien s > 1. 



(3.6) 



Proof: We only discuss the case when < s < 1 as the other situation is easier and can be shown in 
a similar way. We choose A = m~ e , 6 > and get by Lemmas 3^, [ 
some constant C > such that with confidence 1 — 5 



3.3, and 3.4 that there exists 



£(f*,x) - £(f P ) <CmTi log 



2 + 2m 
~ 5~ ' 



(3.7) 



where 



7 



mm 



1 



1 + rj/a 

The maximum of 7 is achieved when 



26U 



26»(1 -s) 1 l-2s 
1 + s ' 2 ~ 



29s 



1+s 1+s 



1 + s 



2 1 + ?7/al + 2s' 
Substituting the above choice into (fT?]) yields (|3.6|) . 



□ 



Improvements of the learning rate can be achieved if higher regularity is imposed on the kernel K 
pjj ] or better estimates of the sampling error are engaged [11]. Another remark is that the assumption 
of positive-definiteness and symmetry on K might be abandoned by using the strategy in [2C[. 
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